logo

1.Exploring a Time Series using Meta’s Prophet forecasting system.

1.1 Introduction

Essentially, the purpose of the project is to provide into the future trajectory of the UK’s GDP based on historical data between 1948-2023 and utilising time series forecasting techniques. The analysis and results from my project may be valuable for policymakers and economists in order to make informed decisions and projections regarding the UK’s economy.

My project contains historical GDP data of the United Kingdom from 1948-2023. This data has been obtained from the Office for National Statistics(UK).

1.2 Instructions with R Code

Step 1: Prophet Package

Ensure the package ‘prophet’ is installed.This will be necessary to be able to open prophet further down.

Step 2: Dataframe

Create the time series as an R dataframe. Label the time column (years) ‘ds’ and the data column of GDP in Millions ‘y’

library(readxl)
Gdp_of_the_uk_1948_2023 <- read_excel("statistic_id281744_gdp-of-the-uk-1948-2023.xlsx")
UK_GDP.df <- data.frame(ds =Gdp_of_the_uk_1948_2023$Years, y = Gdp_of_the_uk_1948_2023$GDPinMillions)

Step 3: Time Series

Now we must convert our data into a time series using the ts function. For confirmation, we use to class function as a following step.

UK_GDP_ts <- ts(UK_GDP.df$y, start = min(UK_GDP.df$ds), frequency = 1)
UK_GDP_ts
## Time Series:
## Start = 1948 
## End = 2023 
## Frequency = 1 
##  [1]  381461  394097  407268  422401  428734  452548  472254  491243  499555
## [10]  509379  516239  538278  572313  587589  593887  622566  658062  671975
## [19]  682419  701215  739468  753606  773977  801969  836938  891381  869155
## [28]  856049  881878  903587  941461  976566  956533  949961  968629 1008981
## [37] 1031385 1073547 1106912 1166905 1229650 1259191 1266248 1248461 1251525
## [46] 1279802 1323443 1355023 1390010 1458467 1508263 1554509 1621644 1663462
## [55] 1693271 1746551 1788931 1837927 1881785 1931094 1926735 1837817 1878960
## [64] 1900476 1929229 1963807 2026566 2071561 2111357 2167415 2197841 2233921
## [73] 2002489 2176203 2270764 2274050
class(UK_GDP_ts)
## [1] "ts"

Step 4: Plotting

Now we must open prophet from the library. Due to my data being in a yearly format, we must convert my ‘Year’ column into a Date format. Using the Prophet function will allow me to plot the time series forecasting model for my data.

library(prophet)
## Loading required package: Rcpp
## Loading required package: rlang
library(rlang)
library(Rcpp)
library(prophet)
UK_GDP.df$ds<-as.Date(as.character(UK_GDP.df$ds), format = "%Y")
m <- prophet(UK_GDP.df, 
             changepoint.prior.scale = 0.05, n.changepoints = 17)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# In this case, we disable the 'daily' and 'weekly' seasonality, as my data period is set 'yearly'.
f = prophet::make_future_dataframe(m,periods = 8, freq = "year")
Future_Years <- f
Predicted_Forecast <- predict(m, Future_Years)
plot(m,Predicted_Forecast)

# The graph displayed above plots the trend 'm' against the future 'Predicted_forecast' of my data for the next 8 years. We can clearly see that the future Predicted_forecast predicts that the trend will continue to stay fairly consistent. The graph suggests that by 2031, the UK's GDP will reach approximately 2,500,00 (£million).

2. Analysis & Results of my Data

2.1: Trend & Seasonality

prophet_plot_components(m, Predicted_Forecast)

# From observing my trend and seasonality in the two graphs above, we analyse that the 'trend' the UK's GDP from 1948-2023 remains positively correlated, as GDP has a directly proportionally increase with time (in years).

# However, interestingly we also observe that this is not the case for the yearly seasonality of the UK's GDP. Between January-June, there is high seasonality, indicating that there is larger fluctuation of GDP in these months compared to the rest of the year.We can observe from the graph two significant dips in March and April of each year.

# One assumption that could be made is that these troughs in March ad April each year correspond with the fact that in the UK the fiscal year operates from 1st April- 31st March. 
dyplot.prophet(m, Predicted_Forecast)
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `select()` instead.
## ℹ The deprecated feature was likely used in the prophet package.
##   Please report the issue at <https://github.com/facebook/prophet/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# This second graph above displays an interactive version of the Trend and Seasonality of my data.

2.2: Linear Regression

Regression_model<-lm(y~ds, UK_GDP.df)
summary(Regression_model)
## 
## Call:
## lm(formula = y ~ ds, data = UK_GDP.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -153895  -69772    4888   70261  158536 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.995e+05  1.235e+04   64.73   <2e-16 ***
## ds          7.244e+01  1.253e+00   57.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87550 on 74 degrees of freedom
## Multiple R-squared:  0.9783, Adjusted R-squared:  0.978 
## F-statistic:  3341 on 1 and 74 DF,  p-value: < 2.2e-16
# My linear regression model displays a strong positively correlated relationship between years and GDP, indicating that GDP tends to increase by approximately £72.44 million each year. My model shows that Adjusted R-squared = 97.8% (variance) in GDP, suggesting that my model is a good fit for the data.

# Both the intercept and 'Year' coefficients are highly significant (p<0.001), highlighting the stability of the relationship between them. The residual standard error is approximately £87,550 million which represents the average deviation of observed GDP values from the predicted GDP values. 

# Conclusively,the regression model provides fairly reliable and valuable results.
plot<-plot(UK_GDP.df$ds,UK_GDP.df$y,xlab = "Year",ylab = "GDP in Millions")

# In the graph presented above, we can analyse that generally, as the years have gone on, GDP has increased alongside it. 

# However, if we observe the graph in more detail, we can see that there has been sharp bursts of GDP growth around 1976, 1980, 1990 and 2009. 

# An assumption that could be highlighted from the sharp growth in 2009, may had been influenced due to the previous 2007/2008 financial crisis having an effect on the UK's GDP. This consequently then resulted in a large growth in 2009 following the recession from the previous year.

# Additionally, one can observe that there are two anomalies that lie in the years 2020 and 2021. 

# A fair assumption that could be made is that these lower GDP values in 2020 and 2021 were impacted due to COVID-19.
plot(fitted(Regression_model),rstandard(Regression_model),xlab= "Fitted Values", ylab= "Standardised Residuals", main= "Residuals vs Fitted", type= "l")

3.Further Analysis

3.1: Comparison with CO2 Example

We proceed with the same process used in my own dataset, including linear regression.

Part 1-Dataframe and Plotting
co2.df = data.frame(  ds=zoo::as.yearmon(time(co2)),   y=co2)
m2<- prophet::prophet(co2.df)
## Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
# Again, we disable the 'daily' and 'weekly' seasonality, because the CO2 example has a time period set to 'yearly'.
f2 = prophet::make_future_dataframe(m2, periods=8, freq="quarter")
p = predict(m2, f2)
plot(m2,p)

# In the graph displayed above, we can comment on the comparison between the data in my project and the Co2 data example. Both my graph and the graph in the CO2 data display a positive relationship between the x and y variables. However, the CO2 example measure time 'monthly' whereas my data expresses time in 'years'. 

# In both data sets '8 periods' have been predicted into the future, but for my data 8 periods= 8 years, but for the CO2 example 8 period= 2 years. 
prophet_plot_components(m2, p)

# Both trend graphs from my dataset and the CO2 dataset are extremely similar. As  CO2 gas increases over time, so does the GDP in the UK.

# Opposingly, the graph of seasonality for the CO2 over the years, visually appears very different to my dataset on GDP over the years. In the CO2 example there are many more and larger fluctuations of data spread over time. With the largest dip being in September of each year. 
dyplot.prophet(m2,p)
# Displayed above is an interactive version of the Trend and Seasonality from the CO2 example dataset.
Part 2-Regression model for CO2 example
Regression_model2<-lm(y~ds, co2.df)
summary(Regression_model2)
## 
## Call:
## lm(formula = y ~ ds, data = co2.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0399 -1.9476 -0.0017  1.9113  6.5149 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.250e+03  2.127e+01  -105.8   <2e-16 ***
## ds           1.308e+00  1.075e-02   121.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.618 on 466 degrees of freedom
## Multiple R-squared:  0.9695, Adjusted R-squared:  0.9694 
## F-statistic: 1.479e+04 on 1 and 466 DF,  p-value: < 2.2e-16
# The CO2 example has a slightly lower R-squared value (96.5%) compared to my GDP data (97.8%), indicating that the C02 model produces slightly more variance in their data.

# Both models show high significant coefficients for the intercept and the 'year' terms, which implies that both models have robust relationships between the time (years) and their respective variables (CO2 levels for the CO2 example and GDP for my dataset).
plot<-plot(co2.df$ds,co2.df$y,xlab = "Year",ylab = "PartsperMillion")

# The CO2 regression model graph conveys a slightly more positive linear relationship between its x and y variables than my data set. It should also be noted that there are more points plotted on this graph, which highlights that the CO2 example contains more observed data points compared my data.
plot(fitted(Regression_model2),rstandard(Regression_model2),xlab= "Fitted Values", ylab= "Standardised Residuals", main= "Residuals vs Fitted", type= "l")

plot(decompose(co2))

# Here, we see the decomposition of the CO2 example as a time series. Notice how 'Random' (which is the Noise), is just Noise= Observed Data- Trend- Seasonality in mathematical terms.

3.2: Concluding Remarks

# As we conclude this report, it can be said that both the CO2 example and my project data have strong similarities as well as differences. 
# I propose that my project has more reliable data. This is due to the fact that the C02 example only observes data up to the year 2000. Whereas, my dataset observes data until the year 2023. 

# The more recent the data, the better predictions that can be made for a time series, and thus producing more accurate and reliable results. Moreover, since my data set contains a smaller number of observed points, it is easier and clearer to see on graphs.

# Although, it should be made known that these two data sets are completely independent and uncorrelated to one another. So we are only able to compare these data sets to a certain extent, as the responsive variables are not the same and use completely different units (£millions for GDP and C02 levels in %).
logo